Skip to content

[WIP] Support Prism#347

Open
kzkn wants to merge 31 commits into
masterfrom
prism
Open

[WIP] Support Prism#347
kzkn wants to merge 31 commits into
masterfrom
prism

Conversation

@kzkn
Copy link
Copy Markdown
Member

@kzkn kzkn commented Mar 1, 2025

Strategy

Swap the parsing backend from Ripper to Prism. The chosen approach is a full rewrite of the formatter (Rufo::PrismFormatter in lib/rufo/prism_formatter.rb), not a parser-only swap.

Rejected alternative: keep lib/rufo/formatter.rb (4221 lines) and adapt it to consume Prism AST nodes in place of Ripper sexp.

Reasons:

  • Prism AST shape differs significantly from Ripper sexp: named fields vs. positional arrays, different node decomposition for heredocs, comments, and operator calls.
  • The existing formatter is intertwined with Ripper's token stream and recovery quirks (lex/sexp dual access, sexp_unparsable_code fallback, token-position-driven comment placement).
  • A rewrite forces a clean re-evaluation of each formatting decision (alignment options, indent semantics, comment interleaving) against the new AST instead of patching old assumptions.
  • Cost is paid once; the result is a smaller engine whose state can be reasoned about node by node.

Both engines coexist via engine: :prism on Rufo.format. The default remains Ripper until PrismFormatter reaches feature parity. Default-flip criteria (to be confirmed): all checklist files migrated, no behavior regression on a sample corpus, no significant performance regression.

Architecture

  • lib/rufo/prism_formatter.rb — visitor-based formatter over Prism AST.
  • Source-offset cursor (@source_offset) drives comment interleaving: comments live in parse_result.comments (outside the AST), so the visitor drains them before each node enters output, classifying standalone vs trailing by inspecting the source line.
  • Layout primitives: write, write_newline, indent_by, with a pending-indent flag so nodes do not need to know their own indent.
  • Heredoc queue: <<EOS opening is written in place; body and closing are appended on the next write_newline (their source locations live separately from the opening's).
  • Non-fatal Prism errors: a small whitelist (:invalid_block_exit, :invalid_retry_without_rescue) is treated as semantic-validity issues rather than parse failures, matching the legacy formatter's behavior on syntactically-valid-but-semantically-invalid code.

Spec sync policy

spec/lib/rufo/prism_formatter_source_specs/ mirrors the layout of spec/lib/rufo/formatter_source_specs/. Verbatim copy when every case passes; hand-curated subset when only some pass; no file when the topic is not yet supported. The header comment in spec/lib/rufo/prism_formatter_spec.rb documents the sync workflow.

Progress

  • spec/lib/rufo/formatter_source_specs/3.1/endless_methods.rb.spec
  • spec/lib/rufo/formatter_source_specs/3.1/method_definition.rb.spec
  • spec/lib/rufo/formatter_source_specs/3.1/pattern_matching.rb.spec
  • spec/lib/rufo/formatter_source_specs/3.1/valueless_hash.rb.spec
  • spec/lib/rufo/formatter_source_specs/3.2/method_definition.rb.spec
  • spec/lib/rufo/formatter_source_specs/alias.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_assignments.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_case_when.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_chained_calls.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_comments.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_hash_keys.rb.spec
  • spec/lib/rufo/formatter_source_specs/align_mix.rb.spec
  • spec/lib/rufo/formatter_source_specs/and_or_not.rb.spec
  • spec/lib/rufo/formatter_source_specs/array_access.rb.spec
  • spec/lib/rufo/formatter_source_specs/array_literal.rb.spec
  • spec/lib/rufo/formatter_source_specs/array_setter.rb.spec
  • spec/lib/rufo/formatter_source_specs/assignment_operators.rb.spec
  • spec/lib/rufo/formatter_source_specs/assignments.rb.spec
  • spec/lib/rufo/formatter_source_specs/backtick_strings.rb.spec
  • spec/lib/rufo/formatter_source_specs/begin_end.rb.spec
  • spec/lib/rufo/formatter_source_specs/BEGIN.rb.spec
  • spec/lib/rufo/formatter_source_specs/begin_rescue_end.rb.spec
  • spec/lib/rufo/formatter_source_specs/binary_operators.rb.spec
  • spec/lib/rufo/formatter_source_specs/blocks.rb.spec
  • spec/lib/rufo/formatter_source_specs/booleans.rb.spec
  • spec/lib/rufo/formatter_source_specs/break.rb.spec
  • spec/lib/rufo/formatter_source_specs/calls_with_dot.rb.spec
  • spec/lib/rufo/formatter_source_specs/calls_with_receiver.rb.spec
  • spec/lib/rufo/formatter_source_specs/case.rb.spec
  • spec/lib/rufo/formatter_source_specs/chars.rb.spec
  • spec/lib/rufo/formatter_source_specs/class_into_self.rb.spec
  • spec/lib/rufo/formatter_source_specs/class.rb.spec
  • spec/lib/rufo/formatter_source_specs/class_rescue_end.rb.spec
  • spec/lib/rufo/formatter_source_specs/class_variables.rb.spec
  • spec/lib/rufo/formatter_source_specs/comments.rb.spec
  • spec/lib/rufo/formatter_source_specs/constants.rb.spec
  • spec/lib/rufo/formatter_source_specs/defined.rb.spec
  • spec/lib/rufo/formatter_source_specs/double_newline_inside_type.rb.spec
  • spec/lib/rufo/formatter_source_specs/double_quotes.rb.spec
  • spec/lib/rufo/formatter_source_specs/endless_methods.rb.spec
  • spec/lib/rufo/formatter_source_specs/END.rb.spec
  • spec/lib/rufo/formatter_source_specs/END.rb.spec
  • spec/lib/rufo/formatter_source_specs/flip_flop.rb.spec
  • spec/lib/rufo/formatter_source_specs/floats.rb.spec
  • spec/lib/rufo/formatter_source_specs/for.rb.spec
  • spec/lib/rufo/formatter_source_specs/global_variables.rb.spec
  • spec/lib/rufo/formatter_source_specs/hash_literal.rb.spec
  • spec/lib/rufo/formatter_source_specs/heredoc.rb.spec
  • spec/lib/rufo/formatter_source_specs/if.rb.spec
  • spec/lib/rufo/formatter_source_specs/imaginaries.rb.spec
  • spec/lib/rufo/formatter_source_specs/inline_classes.rb.spec
  • spec/lib/rufo/formatter_source_specs/integers.rb.spec
  • spec/lib/rufo/formatter_source_specs/junk_drawer.rb.spec
  • spec/lib/rufo/formatter_source_specs/keyword_arguments.rb.spec
  • spec/lib/rufo/formatter_source_specs/lambdas.rb.spec
  • spec/lib/rufo/formatter_source_specs/leading_newlines.rb.spec
  • spec/lib/rufo/formatter_source_specs/lonely_operator.rb.spec
  • spec/lib/rufo/formatter_source_specs/lonely_property_setters.rb.spec
  • spec/lib/rufo/formatter_source_specs/lonely.rb.spec
  • spec/lib/rufo/formatter_source_specs/lonely_receiver_and_block.rb.spec
  • spec/lib/rufo/formatter_source_specs/method_calls.rb.spec
  • spec/lib/rufo/formatter_source_specs/method_definition.rb.spec
  • spec/lib/rufo/formatter_source_specs/method_definition_with_receiver.rb.spec
  • spec/lib/rufo/formatter_source_specs/mixed_quotes.rb.spec
  • spec/lib/rufo/formatter_source_specs/module.rb.spec
  • spec/lib/rufo/formatter_source_specs/multiline_comments.rb.spec
  • spec/lib/rufo/formatter_source_specs/multiple_assignments.rb.spec
  • spec/lib/rufo/formatter_source_specs/next.rb.spec
  • spec/lib/rufo/formatter_source_specs/nil.rb.spec
  • spec/lib/rufo/formatter_source_specs/parens_in_def.rb.spec
  • spec/lib/rufo/formatter_source_specs/parens.rb.spec
  • spec/lib/rufo/formatter_source_specs/pattern_matching.rb.spec
  • spec/lib/rufo/formatter_source_specs/percent_array_literal.rb.spec
  • spec/lib/rufo/formatter_source_specs/property_setters.rb.spec
  • spec/lib/rufo/formatter_source_specs/range.rb.spec
  • spec/lib/rufo/formatter_source_specs/rationals.rb.spec
  • spec/lib/rufo/formatter_source_specs/received_and_block.rb.spec
  • spec/lib/rufo/formatter_source_specs/receiver_and_block.rb.spec
  • spec/lib/rufo/formatter_source_specs/redo.rb.spec
  • spec/lib/rufo/formatter_source_specs/refinements.rb.spec
  • spec/lib/rufo/formatter_source_specs/regex.rb.spec
  • spec/lib/rufo/formatter_source_specs/retry.rb.spec
  • spec/lib/rufo/formatter_source_specs/return.rb.spec
  • spec/lib/rufo/formatter_source_specs/semicolons.rb.spec
  • spec/lib/rufo/formatter_source_specs/single_quotes.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_after_comma.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_after_lambda_arrow.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_after_method_name.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_binary.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_block_brace.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_dot.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_equal.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_hash_arrow.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_unary.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_around_when.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_in_commands.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_in_inline_expressions.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_inside_array_bracket.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_inside_hash_brace.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_in_suffix.rb.spec
  • spec/lib/rufo/formatter_source_specs/spaces_in_ternary.rb.spec
  • spec/lib/rufo/formatter_source_specs/special_global_variables.rb.spec
  • spec/lib/rufo/formatter_source_specs/squiggly_heredoc.rb.spec
  • spec/lib/rufo/formatter_source_specs/string_literal_concatenation.rb.spec
  • spec/lib/rufo/formatter_source_specs/string_literals.rb.spec
  • spec/lib/rufo/formatter_source_specs/suffix_if.rb.spec
  • spec/lib/rufo/formatter_source_specs/suffix_rescue.rb.spec
  • spec/lib/rufo/formatter_source_specs/super.rb.spec
  • spec/lib/rufo/formatter_source_specs/symbol_literals.rb.spec
  • spec/lib/rufo/formatter_source_specs/ternaries.rb.spec
  • spec/lib/rufo/formatter_source_specs/trailing_commas.rb.spec
  • spec/lib/rufo/formatter_source_specs/unary_operators.rb.spec
  • spec/lib/rufo/formatter_source_specs/undef.rb.spec
  • spec/lib/rufo/formatter_source_specs/unless.rb.spec
  • spec/lib/rufo/formatter_source_specs/until.rb.spec
  • spec/lib/rufo/formatter_source_specs/variables.rb.spec
  • spec/lib/rufo/formatter_source_specs/visibility_indent.rb.spec
  • spec/lib/rufo/formatter_source_specs/visibility_markers.rb.spec
  • spec/lib/rufo/formatter_source_specs/while.rb.spec
  • spec/lib/rufo/formatter_source_specs/yield.rb.spec

kzkn and others added 30 commits March 1, 2025 11:59
The previous implementation wrote the message before the receiver,
which produced "fooa" for "a.foo". Branch on call_operator_loc to
distinguish method calls (receiver, op, message) from unary prefix
operators (message, receiver). Add a regression spec covering the
simple dot, chained dots, and safe navigation cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEBUG was hard-coded to true and printed `p [:debug, ...]` on every
parse, polluting test output. Read RUFO_PRISM_DEBUG from the env so
local debugging stays opt-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prism exposes comments outside the AST (parse_result.comments), so
the visitor needs to drain them at the right points to preserve
order against AST nodes.

Track @source_offset (position past the last source bytes already
accounted for). write_code_at now drains comments before the
location, writes the source slice, and advances the cursor.
visit_statements_node drains before each child as well, so
standalone comments above a statement land on their own line.
PrismFormatter#format calls visitor.finish at the end to drain
remaining comments past the last statement.

emit_comment classifies each comment by what precedes it on its
source line: whitespace-only -> standalone (own line at current
position), otherwise trailing (preserve the gap from the previous
emitted source position).

visit_nil/true/false/instance_variable_read switch to
write_code_at(node.location) for consistent cursor tracking;
visit_call_node now uses write_code_at(message_loc) for the same
reason. visit_local_variable_write_node and visit_undef_node
manually consume up to their start so leading standalone comments
are not skipped.

Adds a comments.rb.spec covering the cases that do not require
blank-line preservation: single standalone, two consecutive
standalones, trailing-after-code, standalone-before-code, and
trailing-then-standalone. Blank-line preservation between comments
needs layout state (next step) and is left for then.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace bare @output << x with a small write API:
- write: appends content and emits the pending indent if we are at
  the start of a line.
- write_newline / write_newline_unless_pending: ends a line, marks
  indent as pending for the next write.
- indent_by(amount) { ... }: scoped indent push.

write_code_at flows through write, so cursor tracking and indent
emission stay consistent for the existing literal visitors.
emit_comment uses write/write_newline too, so standalone comments
land at the current indent.

Add visit_if_node as the first non-literal node to validate the
state machine: predicate -> newline -> indent_by(INDENT_SIZE) -> body
-> newline -> end. Spec covers indent introduction on a flat body,
preservation of an already-indented body, double indent for nested
if, and an empty body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prism gives heredocs three locations: opening (<<EOS), content
(body), and closing (the EOS terminator line). The node's
location only spans the opening — the body and closing live
later in source on their own lines.

Detect heredocs in visit_string_node by the opening_loc.slice
prefix (<<). Write the opening verbatim, push the node onto a
@pending_heredocs queue, and advance the source cursor past
closing_loc.end_offset so the cursor stays consistent with
emit position.

write_newline flushes the queue right after emitting "\n",
which is the correct insertion point: the body must follow
the line that opened the heredoc. finish() also flushes so a
trailing heredoc at EOF is not lost. The body/closing are
copied verbatim from source, which is the correct behavior
for all three forms (<<, <<-, <<~) because Prism's location
ranges already encode the desired indentation.

Spec covers bare, assigned, dash, and squiggly forms, plus a
heredoc followed by a statement and one followed by a
standalone comment then a statement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEBUG and both debug_log methods were development-only print
helpers. The env-gated form added in 6a was still dead weight in
the shipped formatter; drop the constant, the two method
definitions, and the lone call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RedoNode / RetryNode are bare keywords — write_code_at(node.location).
AliasMethodNode and AliasGlobalVariableNode share the same shape
(keyword, new_name, old_name with single spaces); share a private
visit_alias helper.

Prism flags top-level redo/retry as :syntax-level errors even though
it still builds a complete AST (semantic validity, not parse
failure). The existing Ripper-based formatter formats these inputs
fine. Add a NON_FATAL_ERROR_TYPES list so PrismFormatter skips
:invalid_block_exit and :invalid_retry_without_rescue when deciding
whether to raise. Other :syntax errors (e.g. dynamic constant
assignment in def) still raise, preserving the existing behavior.

Specs are copied verbatim from formatter_source_specs/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
visit_unless_node mirrors visit_if_node but also dispatches the
optional else_clause (an ElseNode) before writing end. Add
visit_else_node as a shared helper that emits "else", a newline,
the indented body, and a trailing newline; visit_if_node will
reuse it once elsif chains are designed.

Both unless cases from formatter_source_specs (with and without
an empty else) format and round-trip correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Safe-navigation cases (foo &. bar / foo&. bar) round-trip through
the existing visit_call_node with no new code.

XStringNode, RegularExpressionNode, and InterpolatedRegularExpressionNode
are all "copy node.location verbatim" cases for the topics covered
in the specs; add the three visitor methods and copy the
corresponding spec files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a header comment explaining the relationship between
prism_formatter_source_specs/ and formatter_source_specs/:
verbatim copies when fully supported, hand-curated subsets
when partial, no file when unsupported. Syncing is manual
until PrismFormatter approaches parity and a per-engine
PENDING marker can replace the dual-directory arrangement.

This is the lower-impact half of the original 6b plan. A
true consolidation (single spec dir + per-engine pending
markers) would require extending the spec parser and is
deferred until the divergence between engines shrinks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove unused code_at — every caller already uses write_code_at.
- Drop the amount parameter on indent_by and rename to indent. The
  only argument ever passed was the INDENT_SIZE constant; pull the
  constant inside the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant